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Eecent trends in college adaissions are discussed in 
xeras of their influence test Talidation procedures. Jn particular, 
the effects pf nev college adaission practices are considered vith 
respect to the problea of Talidating placeaent tests. Traditional 
test Talidation technigues are reTieved and coa pared to the needs 
specific to placeaent tests; an e^taaple of a placeaent test 
Talidation is presented. Content Talidity and trait- treat aent 
interaction analyses are stressed in the exaaple analysis, and the 
possibility of application of decision-theoretic utility aodels is 
introduced. In the exaaple analysis, disordinal trait-treataent 
interactions vere found in three colleges. (Author) 
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VALII^ATIKG ^lACEXESI TESTS 
Hunter Brdand and David Eogosa^ 

Educational Testing Service Stanford University 

Secent eapbases cn plLaceiaent instnsents arise froia a trend toward the 

admission into iczmy colli^ges of students vho do not have traditional acadesdc 

skills. These kinds of admission practices result in a need for special pro- 

grass suited for handling students of diverse background and preparation. In 

discussing this problea, l^iilinghan (1974) points to several shifts that have 

occurred since the late 1950's, in the vay higher education has adapted to 

individual differences among students* In the post-Sputnik era there was a 

fascination with high-level scientific talent, in the Biid-1960's there was a 

shift to aK>re generally selective admissions because of a population bulge in 

the 18-year-old age group, and in the late i960's op^n admission policies came 

about in response to societal demands. WiUingham then notes that: 

Y0xe recently it has become clear that access Is not enough and 
that an equally critical problem is how to provide a useful education 
for students with very different needs and very different backgrounds — 
i.e., how to deal effectively with wide individual differences that 
result from free-access policies. From the standpoint -of assessing 
individual differences, the enphasis has changed from identifying 
students to determining how to educate -^them. Tumbull (1974) has 
called it a shift from "which" to "now" (p. 1). 

Colleges throughout the country are now experimenting with methods for 
handling the diversity of entering students. Reoediai and compensatory pro- 
grams, mastery teaching, and person<fllzed systems of instruction represent 
some of the approaches being tried. Which of the various placement, exeii?>tion, 
and instructional techniques works best, ho>wer, is not yet known. But It Is 
clear that placement tests are needed to assist colleges with the instructional 
problems they face. As new placement Instruments are developed, a need arises 



^Based upon research conducted while the junior "author was a Summer Fellow at 
the Educational Testing Service, Princeton, Wew Jersey, 
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to scudy ho» they cay be used cost ef fectiv'ely and hov they sdLght be iacproved* 
This paper is based upon experiences obtained during studies of an eacpr-rri- 
aeatal English placeiaeat test. The following sections describe saethods used, 
results obtained, and caveats and suggestions for future studies of placeoent 
tests. 

Content Analyses 

It 

If a test is to be used to identify students -vho have isastered certain 
aaterial> it is icportant that it adequately cover relevant topics. A proper 
placenent test vill zss^ss a doinain of knovledge, skills, and aptitudes that 
is taught in soae specific course of instruction or sequence of instruction. 
Given a large dooain and liznited tlrse dLn vhich to'^sess.it — which is usually 
the case — Judgssents isust be aade to detersziine vhat content is post iaportant 
and vhat content is best sseasured vithin the confines of a particular fom of 
assessment (e.g., a oultiple-choice paper-and-pencil test). To make decisions 
of this type, coomittees of subject-aatter specialists convene to provide a 
broad perspective of th*^ dciaain in question. National surveys are also usually 
necessary to determine the most equitable representation of topics for a partic 
ular test. Folloving the construction of a test, other groups judge its 
representation of the dozsain of interest as veil as its appropriateness for 
particular applications. 

To .learn vhat college English teachers thought of the content of the exper 
imental English placement test being studied, questionnaires were sent to 200 
English professors in 20Q different colleges. The results of this content 
study shoved that, despite considerable controversy vithin the English teaching 
profession over vhat should be taught (as judged from journals and national con 
ventions), there was surprising agreement vhen professors vere asked to rate 
specified areas of instructional content. The experimental English placement 
test fared veil by this analysis. 

4 
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Correlational Analyses 

Various researchers have qucsticacd the utility of traditional correla- 
tional analyses for the study of placement tests (Cronbach, 1971; Hills, 1971; 
Snow, 1972)- Nevertheless, traditions persevere and one saay expect that cany 
vill continue to consider the correlation coefficient as an ir^ortant eleioent 
in any test analysis. Table 1 presents a saatrix of correlations relevant to 
the study of the experiiaental English placezient test (abbreviated as EEPT in 
Variables 10 and 12 of Table 1). Consideration of the EEPT pretest. Variable 

12, shows reasonable correlations vith variables the test vould be expected to 

2 

relate to. 2tote, especially, the correlations of .39 vith Fall Grades , -43 
vith ah Essay Pretest, .42 vith the Essay Posttest graded holistically, .52 
vith the Essay Posttest graded for grazsnar, usage, and sentence structure (but 
abbreviated as siisply Graissar in table 1), and .64 vith SAT-Verbal. Observe, 
also, that the best predictors of the Essay Posttest score (administered in the 
spring of the freshman year). Variable 4, were Variables 11, 12, and 13, CLEP 
English Cozq>osition, the EEPT Pretest, and the SAT-V Pretest, respectively — ^all 
administered at the beginning of or prior to the freshman year of instruction. 
High School Rank, vhether self-reported or college- reported. Variables 9 and 14, 
tends to have lover correlations vith important outcome variables. 

While these correlational analyses are interesting — and suggestive of the 
usefulness of English placement tests like that being studied — temptations to 
make too much of them should be avoided. Cronbach (1971, ?.500) has asserted 
that "A 'validity coefficient* indicating that test X predicts success vithin a 
treatment tells nothing about its usefulness for placement." In the sense that 
regular freshman Lnglisii com,iosition !:> a "treatment," such a view is applicable 
in the present study. Other writers lutvc i .spoused a position .similar to that of 



For short'-sequence students, that is, regular freshman English students as 
contrasted to students placed in a longer sequence for purposes of remedia- 
tion, compensatory progranning, or vertical sectioning. 
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Cxonbach (e.g., Saow, 1972; Hills, 1971; Thorndike, 1971). 

Trait-Treatment lateractioy Analyses 

Trait-treatJDent interaction (TTI) analysis is considered by some researchers 
to be the cost useful aethod for analyzing placeiaent tests (Cronbach, 1971; 
Willinghaa, 1974). In the instructional setting, an interaction ii^lies that 
the advantage of a long-sequence of isistruction over a short-sequence of instruc- 
tion varies according* to the level of the placezaeat test score obtained prior 
3 

CO instruction- The notion of TTI is inherent in the logic of placing 
students vith different levels of knowledge in different educational treatsients. 
The question of interest in a placenent situation is "Will a student be better 
off in the norsal treatiaent or in an alternative treatment?" An answer to this 
question clearly requires inforzaation comparing outcozaes for a particular place- 
ment score for both the conventional and the alternative treatment group^-^ 

The importance and usefulness of the TTI is best understood by examining 
the regression of the end-of-sequence criterion on the placement test for the 
two groups of interest: (1) students placed in long-sequence instruction, and 
(2) students placed in short-sequence instruction. In the optimal case, the 
regression lines will differ substantially between treatments^ as shown in 
Figure 1. Note that the regression line for the long-sequence group is relatively 
'flat, while the regression line for the short-sequence' group is steeper. The 
advantage of placement into the long-sequence for those on the lower portion of 
the placement test scale is apparent from an inspection of the differences in 
the regression lines. Regression line C-D, in Figure 1, represents what might 
be expected if students were randomly placed (regardless of placement test 
score) in a special long-sequence of instruction, such as a, remedial English 



The term, long-sequence, includes both remedial instruction of longer duration 
than regular: (sho r t-scg uence) instruction and non-remedial instruction of long- 
er duration than regular instruction. 
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how Placement Test ^ High 

Figure if Illustration of the TTL uaunptlon In the case 
of placeaent. 



^Adapted from Willlngham (1974) 
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course of some kind of cocpensatory progranning* The en d-of -sequence criterion 
might be course grades at the completion of a regular course in, say, freshman 
English. The regular course represents the end of the long-seq^aencc; the 
reciedial course is only the first part of the long-sequence* Therefore, grades 
in the reisedial course are not appropriate for use as a criterion in the TTI 
analyces. Line C-D shows that students with lower scores on the placement test 
performed better at the end of instruction than did similar students who were 
placed in the sho* ^- (regular) sequence represented by line A-3. The regression 
lines of Figure i are,._(oJE course^ hypothetical. These kinds of outcomes will 
not' occur unless the placement instrument is finely tuned to the instruction — 
especially zo the remedial instruction* 

>jethodology for. TTI Analysis , The analysis of TTI data is a process of 
comparing the within-treatment regressions -of a suitable criterion variabxe on 
the placement test* Non-parallel regression lines indicate that a trait- 
treat^nt interaction exists. IJiere are different kinds of interactions, 
however. Using the language of Cronbach and Gleser (1965), ordinal interactions 
are' indicated by nwi-parallel lines which do not intersect in the range of inter- 
est, whereas disordinal interactions ^re indicated by lines which intersect in 
the range of intexest. Clearly, disordinal interactions (see Figure 1) are erf 
primary interest for placement decisions* Assuming a valid criterion variable 
is available, the point of Intersection provides a straightforward cutting 
point for/>asslgnment to alternative educational treatments. 

The ^statistical comparison of two regression lines requires that the two 
groups have similar distributions. One measure of importance is the variance 
aboiit the regression lin^s (residual variance); the residual variances should 
be equal *or nearly equal for proper comparison of the regression lines. Large- 
sample tests of the hypothesis of equal residual variances are provided by 
Gulllksen and Wilks (1950) and Stroud (1972). 



If the residual variances are not significantly different, the next step 
is the test for equal regression slopes. Establishing a difference in regres- 
' sion slopes is the key evidence in the detection of a trait-treatment inter- 
action* Standard regression theory^ (e.g. , Kendall & Stuart, Vol. II, 1967, 
p. 371-372) provides a t-test, or an equivalent F-test, for the significance of 
the difference of the estimated regression slopes. Unfortunately^ tests for ' 
interaction have relatively little power (Cohen, i969; Cronbach & Snow, in 
press). Consequently, failure to reject the null hypothesis of equal regrefssion 
slopes, when sampHes are not large, cannot be regarded as conclusive evidence of 
the absence of TTI. Examination of the estimated regression* slopes and the 
^ associated confidence intervals supplements the hypothesis testing and provides 
a more detailed description of the data and of the' lilcelihbod of TTI. 

Investigating the possibJLlity of a disordlnal interaction is the final 
step in the TTI regi^ession analysis.. Graphical Inspection is often useful for 
a. rough determination. Statistical inferences can be made for the point of 
intersection of the regression Mries. Jf ' the true'cutting score (the abscissa 
of the point of intersection), which we denote as x , lies in the range of inter- 
e.st of the predictor variable (the placement test score), then a useful 
dlsordinal interaction 'exists. Robison (1964) demonstrated, under the assumption 
of equal residual variances, " tha,t the maximum likelihood estimator of x is 



o 

. " ^ — f where 

a. and b are the estimated intercept and slope, respectively^ for the regression 
equation in group i. Kastenbaum (1959) derives^ confidence intervals for x 
(assuming normality) based on the t-distribution. The width of this confidence 
interval is a measure of , the precision and", therefore, the usefulness of the 

cutting score. Tnis. confidence interval for x^ is identical with the region of 

i ^ 

nonsignlficarfce obtained from the Johnson-Neyman Technique with one predictor. 
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Some Results of/TTI Analyses 

•■ ..' ■ . 

Regression analyses were performed on the schools in the study which pro-^ 

vlded surtlclent data. The variances about the regression lines for the two 

pla.c^ment groups for each of the schools were first examined. Most of the 

standard errors of estimation were quitfe similar within schools and the null 

hypothesis of no difference* was tenable. 

Table 2 shows the number of cases determining each regression -line (N), 

the estimated regression slopes (b) , the standard deviations (9(b)), and the 

estimated intercept of the regression lines (a)» Also shown is the estimated* 

♦ • 

reliability of the experimental test for each group within each school (r t) 

. ' xx 

' along with the regression slopes corrected (b*) for the attenuating effect of 
measurement erraif in the EEPT scores. Although the reliabilities between » 
schools vary considerably, the reliabilities between groups within the schools 
are C[uite similar. This within-school similarity of estimated reliabilities 

' lends some credibility to the assumption that the t-test f<yr equality of 
observed regression slopes is a reasonable, albeit approximate, substitute for 

, an exact* test for the equality of estimated regression slopes for true scores. 
Of course, the stability of the regression coefficients determined 

depends strongly on the sample size. Consequently, -significance tests for 

» 

differences of regression slopes will have much more power in the schools with 
• large sample sizes. Because of small sample sizes in some of 'the schools. 



appreciable differences in t)ie regression slopes will usually fail ^to be signi- 
ficant. In this exploratory analysis, effects that were not statistically 
significant were not disregarded, but conclusions from these effects were viewed 
with "appropriate cautions • 

» ^ '.^^ 

Figure 2 illustrates'^ typical reporting of the. results of a TTI analysis 

' of placement test data. For each school the within-group regression- lines are 

plotted* The indicated -F-test for differences in 'slopes is performed to/ 
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4.0 T- 




2.0 ~ 



1.0 - 



Short-sequence 
(N « 711) 

Long-sequence 
(K - 452) 



r « 6.58 
P " -01 



EEPT Score 



Figure 2» Regression lines for CoTiege C. 



Notes: 
1. 
2. 
3. 

4. 
5. 



An experimental English placement test. 
The dots indicate the group means. 

The F statistic indicated is for the 
hypothesis of equal regression slopes. 

-If b* is used to determine the regres- 
sion lines, the point of intersection 
decreases slightly. 

The symbols, a^, a^, b^, and^b^^s represent 

intercepts (a) and slopes (b) lor 

short (s) and long (1) sequence instructional 

groups. 
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establish the TTI effect. Finally, the estimated cutting score x is shovn. 

Soae tosolved Problems la TTI Analysis^ Iscplicif in decisions to place 
students in different Instructional treatments is the assumption that the trait 
used for placeaent and the treatment interact. In other words, the'assnaa^tion 
is ^at optioua learning, or soae other optiauo outcoaie, is saxlaized by the 
placeaent procedure. Implicit in all of ^these assumptions is one that is per- 
haps less apparent. Ihis is the assumption that, even if desired educational 
outcomes are aaximized by the placement procedures, there is also an econoiiic 
or other justification for the placeaent — for exa^>le, that it is worth an extra 
allocation of financial resources to bring about optiaua educational outcoaes* 
Although it i^ of considerable importance, an analysis incorporating the full 
decision-theoretic fraiaework, including costs, is beyond the scope of this 
paper. 

Despite the theoretical attractiveness of TTI, in practice it has not often 
been as useful as hoped. More often than not, interactions of the type desired 
fail to occur. Reasons why TTI's are difficult to conduct include: 

(1) Bias and unreliability in conmon student perforz^ance criterJLa, 
such as grades. 

(2) Uncontrolled instructional variables. Interactions are most 
likely to occur when instruction is closely tuned to the test. ' 

(3) Frobleas^pecifdc to the curriculum structure. Willingham 
(1974) observes that TTI effects ^wJQ.1 be best seen in a 
"segmented sequence" of courses ?(e.g. , Mathematics courses). 

- la an "ordered series'- of courses, such as in psychology and 
English curricula, end-of-sequence performance criteria are 
often insensitive to treatments occurring at earlier stages*^ ^ 
Even when interactions do occurs it is not always certain what interpreta- 
tion to make. The utility of outcomes is a function not only of student per- 

14 
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fon&ance but of other factors — including the cost of instruction* It 1^ i2Q>ossi 
ble to construct a single criterion scale integrating even performance 
plus cost let alone still other isiportant factors such as student satisfaction, 
lievertheless, to the degree that local adiainistrators and instructors can incor- 
porate Judgs^tal factors vith the TTI analyses, the results can be useful. 

From a methodological perspective, the present art of TTI analysis has 
even further unsolved probleics: 

(!) Power. As has already been noted, the pover of statistical* 
tests is low unless the K is high, but the difficulty of data 
collections required for TTI analysis tends to reduce the N 
a\^ailable. 

(2) Measureneat Error. Since soeasurezaent error in the placeiaent 
test scores serves to flatten the regression slopes and there- 
by aask TTI effects, correction for this attenuation is 
desirable* However, this correction cbraplicates the statis- 
tical ^analysis since the distribution of the corrected slopes 
is unknown if the reliability is estimated. If the cuttings 
score for the observed regression lines lies near the group 
zaeans, then it vUl be little affected by any laeasureaent 
error correction. 

(3) Fixed Predictox^ Variables. Standard, regression theory assuiaes 
that the predictor variables are fixed; that is, the observed 
valu'^s are predetermined and replicable from sanple to sample. 
Clearly, this assumption is violated by placement tests. The 
inferences from the standard analysis are then conditioned on 
the observed values of the placement test scoxeSf and general- 
izations to situations with other observed values are not 
strictly valid. No satisfactory methods for handling these 
problems of inference exists-. 



(4) Units of Analysis^ In the usual classroom situation, students 
vithin a class are not exposed independently to the educational 
treataent. iherefore, class saeaibership should be taken iato 
account in the exaaination of treataent outci»e5« Vhen students 
are treated in groups » TTI effects have three possible, explana- 
tions« They 2nay arise iron the individual's response to the 
treataent, a class effect, or f roc a conparative effect vithin 

"a class. The exasiination of betveen-dass and vithln-class 
regressions is helpful in separating these interaction effects* 
(See Cronbach & Webb (1975) for an Illustration of these 
techniques.) In placeoent situations each of these three 
effects can be ixcportant for the proper allocation of educa- 
tional resources. 

(5) Choice of Criterion Variable. The outcone neasure chosen is 
crucial to the success of the TTI study« The criterion measure' 
should reflect the instructional objectives and not vary widely 
over different classes or schools if these are to be pooled in 
the analysis. 

Beyond TTI Analysis 

Over the years since Cronbach and Gleser (1965) elaborated possible uses 
of decision theory in personnel (and other) comaon decision problems^ much has 
been said and written about the potential of such approaches. Because of 
theoretical and philosophical issues surrounding applications of decision 
theory^ however^ it has been used little' in practice. In education^ it has 
not been used at all. A need has existed for simple operational procedures 
that might embody some of the basic concepts of utility. Davis^ Hickman^ and 
Novick (1973); Haableton and Kovick (1973); and Peterson (1974) have described 
utility models for use in both instruction and selection* 
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A recent paper by Livingston (1974) describes an operational utility- 
based approach that aaay be applicable to the kind of placenent situations 
considered in the present paper. The decision procedure described by Living- 
ston is useful vhen a decisioh-zsakei. isust ta^e one of t«o possible actions, say. 
Accept (A) or Reject (R). Or, the choice ray be between Accelerated (A) or 
Eegular (R) instruction. If the test score cut-off point is and the criter- 
ion "indifference point" Is-y^, 

u iy J = utility of action A for person i, . 
a i 

^^i^ * utility of action R for person i, 

and 

then an increasing utility function u^ and a decreasing utility function u^ 
■ay be imagined as shown in Figure 3. The utility of the decision procedure 
is the sun of the utilities of all the individual decisions: 

U (X ) = Z u^ (y^) + I u^ (y^). 

X. >. X X . > X 

a — o 1 o 

The utility of an ideal procedure is used for cooparison (based on knowledge 
of ac ttxal^ performance) : 

U (y^) = Z u^ (y^) + Z u^ (y^). 



and a utility ratio. 
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"r = l^ 
U (y^) 

is computed* Note that, unlike correlation and regression coefficients, is 

a function of both x and y * Thus, unlike correlation and regression coeff i- . 

o o 

cients, is potentially useful to test users not only in evaluating the use- 
fulness of a particular test for their particular purposes, but also in setting 
cut-off points (provided they can define their indifference point, y^). 

17 
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Ihere are at least tvo probleais -with U^. First, it vould be very diffi- 
cult for test users to construct utility functions like those in F^ure 3. 
Because of this problea, Livingston suggests that a convention be established, 
say, to use sis^le straight-line functions — unless sose reason for doing other- 
vise exists. A second problem vith U^, as described, is that it assumes that 
there are no constraints on the nusbers of persons assigned to A or S. Never- 
theless, a decision-siaker nay be wise to consider ths problea initially without 
constraints and then to nodify the cut-off as suggested by U^with possible con- 
straints in 3sdnd« 
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